A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models
نویسندگان
چکیده
We conducted a comparative analytic study on the contextdependent Gaussian mixture hiddenMarkov model (CD-GMMHMM) and deep neural network hidden Markov model (CDDNN-HMM) with respect to the phone discrimination and the robustness performance. We found that the DNN can significantly improve the phone recognition performance for every phoneme with 15.6% to 39.8% relative phone error rate reduction (PERR). It is particularly good at discriminating certain consonants, which are found to be “hard” in the GMM. On the robustness side, the DNN outperforms the GMM at all SNR levels, across different devices, and under all speaking rate with nearly uniform improvement. The performance gap with respect to different SNR levels, distinct channels, and varied speaking rate remains large. For example, in CD-DNNHMM, we observed 1∼2% performance degradation per 1dB SNR drop; 20∼25% performance gap between the best and least well performed devices; 15∼30% relative word error rate increase when the speaking rate speeds up or slows down by 30% from the “sweet” spot. Therefore, we conclude the robustness remains to be a major challenge in the deep learning acoustic model. Speech enhancement, channel normalization, and speaking rate compensation are important research areas in order to further improve the DNN model accuracy.
منابع مشابه
Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملOn Improving Acoustic Models for TORGO Dysarthric Speech Database
Assistive technologies based on speech have been shown to improve the quality of life of people affected with dysarthria, a motor speech disorder. Multiple ways to improve Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network (DNN) based automatic speech recognition (ASR) systems for TORGO database for dysarthric speech are explored in this paper. Past attempts in develop...
متن کاملDeep Neural Network Frontend for Continuous EMG-Based Speech Recognition
We report on a Deep Neural Network frontend for a continuous speech recognizer based on Surface Electromyography (EMG). Speech data is obtained by facial electrodes capturing the electric activity generated by the articulatory muscles, thus allowing speech processing without making use of the acoustic signal. The electromyographic signal is preprocessed and fed into the neural network, which is...
متن کاملUsing deep neural networks to improve proficiency assessment for children English language learners
We investigated the use of context-dependent deep neural network hidden Markov models, or CD-DNN-HMMs, to improve speech recognition performance for a better assessment of children English language learners (ELLs). The ELL data used in the present study was obtained from a large language assessment project administered in schools in a U.S. state. Our DNN-based speech recognition system, built u...
متن کاملApplication of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014